# Data Preparation - Aggregate total number of refugees by year and convert to thousands (K)
total_refugee_trends <- population_data %>%
group_by(year) %>%
summarise(total_refugees = sum(refugees, na.rm = TRUE) / 1000) # Convert to thousandsExamining Global Refugee Trends and Correlations Over the Years
INFO 526 - Project 1
Abstract
Using the rich dataset offered by the {refugees} R package, this project sets out on an adventurous trip across the complicated landscape of worldwide migration from 2010 to 2022. It offers a comprehensive picture of the numerous factors influencing the lives of refugees, internally displaced people (IDPs), asylum seekers, stateless people, and more, spanning over 64,000 entries from sources like UNHCR, UNRWA, and IDMC. This research tries to unravel the complex web of variables driving forced migration, from socioeconomic turbulence to environmental calamities and geopolitical upheavals, via thorough investigation.
Two key topics form the core of our investigation: how changing political ideologies in the US affect refugee trends, and how changes in external occurrences such as pandemics, conflicts, and climate change affect refugee numbers globally. Using time-series analysis and additional statistical techniques, we examine displacement patterns and correlate them with major political changes and global crises in order to identify underlying trends and causal relationships. In addition to tracking the rise and fall of international migration, this project aims to position these movements in the broader context of international politics, environmental issues, and global health concerns. Our objective is to present a comprehensive picture of the factors influencing worldwide migration patterns by integrating data on refugee movements with external socio-political and environmental events. This study is essentially a deep dive into the dynamics of displacement, providing insights into how international events and policies affect vulnerable populations’ movements. It serves as evidence of the interdependence of worldwide phenomena and their significant influence on human lives, highlighting the necessity of thoughtful, caring answers to the difficulties associated with forced migration.
Introduction
This project analyses more than 64,000 data from the {refugees} R package, covering the period from 2010 to 2022, in order to investigate the intricate dynamics of worldwide displacement. The dataset provides a detailed look at the lives of refugees, internally displaced people (IDPs), asylum seekers, and stateless people, among others. It is derived from UNHCR, UNRWA, and IDMC and sheds light on the complex dynamics of forced migration. The goal is to identify the complex interactions that exist between outside variables and their impacts on patterns of global displacement, including geopolitical conflicts, natural disasters, and sociopolitical upheavals.
The impact of US political positions on refugee trends and the general changes in worldwide refugee populations as a result of external factors like wars, climatic shifts, and health crises like the COVID-19 pandemic are central to our research. We want to analyse the complex relationship between major world events and displacement trends using time-series analysis and statistical evaluations. This study aims to map out the ebb and flow of worldwide displacement while also placing these movements within the broader context of world events. This will help to better understand the forces driving migration, the interconnections of world events, and the effects these events have on displaced populations.
Question 1: How have the patterns in refugee populations evolved over time, and how have the stances of American political parties impacted these developments? Analyse the dynamics of refugee migrations towards US political environments.
Introduction
Our project’s first inquiry explores the complex interplay between international refugee movements and political processes in the United States. We seek to understand how changes in political party ideologies and governmental positions inside the United States may have affected patterns of refugee populations, paying particular attention to historical data from 2000 to 2024. This investigation is based on the theory that political discourse and policy choices coming from the United States can significantly impact worldwide displacement patterns, either by reshaping the geopolitical environment or by changing the country’s own refugee accepting policies.
Approach
We have adopted a multimodal analytical approach to this subject, combining a comparative assessment of political epochs in the United States with time-series analysis. For clarity and scalability in our visual representations, we first aggregate the overall number of refugees by year, normalizing the data to thousands. We can create a chronological narrative of refugee migrations by layering important political events and changes in the United States over these trends thanks to this basic data. We produce visual aids, such as bar plots and trend lines, using ggplot2 to clarify the relationship between refugee populations and changes in U.S. politics. Our research is further enhanced by the addition of data from returning refugees, which provides insights into the cyclical nature of displacement and resettlement.
Data Preparation and Pre-processing The preparation stage of the data is essential to transforming our data-set into a format that allows for effective analysis. Using the R dplyr package, we first aggregate our data by year before summarizing the overall number of refugees—making sure to divide these totals into thousands for easier comprehension. This procedure is also applied to data pertaining to refugees who have returned, yielding a parallel dataset that illustrates the flow of return movements in addition to original displacements. We also prepare by making the ‘year’ a numeric variable so that we can use it as an axis in both our static and animated visualization. The careful processing of the data paves the way for an in-depth investigation of the relationship between U.S. political positions and international refugee trends.
Refugee populations over time.
The graphic illustrates the changes in overall refugee populations over time using a bar graph that shows annual fluctuations. It also includes a trend line that indicates a quadratic growth pattern.
A. Expeditious growth of the refugee population over time
Plot 01 clearly demonstrates the rapid expansion of the refugee population over time. The utilization of sky-blue bars in conjunction with an orange trend line aptly accentuates the fluctuations in refugee figures from the early 2010s to 2022. The y-axis, which represents the number of refugees in thousands, is displayed with comma separators for enhanced Expeditious growth of the refugee population over time clarity.
B. A small number of refugees have returned over a period of time
Conversely, Plot 02 demonstrates that a limited number of refugees have repatriated throughout a specific duration. This graph is an effective tool for illustrating the dynamic nature of refugee populations, highlighting the rising trend and the urgent requirement for global attention and action in response to refugee crises.
Refugees by year and country of origin.
This figure displays the aggregate number of refugees originating from the top 20 nations. The bar chart depicts various countries, with each bar uniformly shaded in purple. The y-axis represents the arrangement of countries, while the x-axis represents the overall number of refugees. The bars are arranged in a manner that emphasizes the nations with the greatest number of refugees, facilitating the identification of the countries with the highest refugee populations. This type of representation facilitates a rapid comprehension of the worldwide refugee issue by illustrating the nations that are most impacted.
Refugee populations over time with US political context.
This graph displays the temporal evolution of the refugee population in the United States, with different colors representing the governing political party. The blue lines indicate the years when the Democratic Party held power, while the red lines show the periods of Republican Party leadership. The graph illustrates the refugee population from 2010 to 2022, with the x-axis representing the years and the y-axis indicating the cumulative number of refugees. The analysis examines the fluctuations in the refugee population during various administrations, emphasizing the potential influence of political leadership on refugee admissions. The unambiguous and simplistic style facilitates viewers’ comprehension of patterns in U.S. refugee admissions throughout the selected years.
Question 2: How the global refugee population fluctuate across the countries? Is their any external factors impact on refugee population like COVID-19 or war or climate change or financial stability?
Glimpse of process Function
# function to pre process the refugee dataset
# input : dataset - tibble
# unique_countries - tibble
# output : filtered_data - tibble
processRefugees <- function (dataset, unique_countries) {
filtered_data <- dataset |>
# filtering only country name, year and refugees columns
select(coo_name, year, refugees) |>
# getting all the countires which are not present in population dataset for a specific years
# bind_rows() is used combine combine rows of two data frames
bind_rows(
# anti_join() is used to return only the rows from the first dataset that isn't having matching rows in the second dataset based on specified key columns
anti_join(unique_countries, dataset, by = c("region" = "coo_name")) |>
# adding year and number of refugees for that specific year as NA
mutate(year = as.integer(dataset[1, "year"]), refugees = NA)
) |>
mutate(
coo_name = case_when(
coo_name == "United States of America" ~ "USA",
coo_name == "United Kingdom of Great Britain and Northern Ireland" ~ "UK",
coo_name == "Iran (Islamic Rep. of)" ~ "Iran",
coo_name == "Palestinian" ~ "Palestine",
coo_name == "Serbia and Kosovo: S/RES/1244 (1999)" ~ "Serbia",
coo_name == "Türkiye" ~ "Turkey",
coo_name == "Congo" ~ "Congo",
coo_name == "Dem. Rep. of the Congo" ~ "Democratic Republic of the Congo",
coo_name == "Cote d'Ivoire" ~ "Ivory Coast",
coo_name == "Central African Rep." ~ "Central African Republic",
coo_name == "United Rep. of Tanzania" ~ "Tanzania",
coo_name == "Russian Federation" ~ "Russia",
coo_name == "Syrian Arab Rep." ~ "Syria",
coo_name == "Bolivia (Plurinational State of)" ~ "Bolivia",
coo_name == "Dominican Rep." ~ "Dominican Republic",
coo_name == "Venezuela (Bolivarian Republic of)" ~ "Venezuela",
coo_name == "Czechia" ~ "Czech Republic",
coo_name == "Rep. of Korea" ~ "South Korea",
coo_name == "Dem. People's Rep. of Korea" ~ "North Korea",
coo_name == "Lao People's Dem. Rep." ~ "Laos",
coo_name == "Viet Nam" ~ "Vietnam",
coo_name == "China, Hong Kong SAR" ~ "Hong Kong",
coo_name == "Netherlands (Kingdom of the)" ~ "Netherlands",
coo_name == "Cabo Verde" ~ "Cape Verde",
coo_name == "China, Macao SAR" ~ "Macao",
coo_name == "Holy See" ~ "Vatican City",
TRUE ~ coo_name
)
) |>
# creating a categorical variable refugee_m to group countries based on their number of refugee's
mutate(
refugees_m = case_when(
refugees < 100 ~ "<100",
refugees >= 100 & refugees < 500 ~ "100 to 500",
refugees >= 500 & refugees < 1000 ~ "500 to 1000",
refugees >= 1000 & refugees < 2000 ~ "1k to 2k",
refugees >= 2000 & refugees < 3000 ~ "2k to 3k",
refugees >= 3000 & refugees < 4000 ~ "3k to 4k",
refugees >= 4000 & refugees < 5000 ~ "4k to 5k",
refugees >= 5000 & refugees < 7000 ~ "5k to 7k",
refugees >= 7000 & refugees < 10000 ~ "7k to 10k",
refugees >= 10000 & refugees < 20000 ~ "10k to 20k",
refugees >= 20000 & refugees < 50000 ~ "20k to 50k",
refugees >= 50000 & refugees < 100000 ~ "50k to 100k",
refugees >= 100000 ~ "100k+",
is.na(refugees) ~ "NA"
)
) %>%
mutate(
refugees_m = factor(refugees_m, levels = c("<100",
"100 to 500",
"1k to 2k",
"2k to 3k",
"3k to 4k",
"4k to 5k",
"5k to 7k",
"7k to 10k",
"10k to 20k",
"20k to 50k",
"50k to 100k",
"100k+",
"NA"))
)
return(filtered_data)
}Function used to generate the plot
# Function for creating the ggplot map plot
# Using the filtered_forests$`2000` dataset created earlier as a data source
# using entity as map_id for first layer
# using forest_convestion as fill aesthetic and word as map for second layer
# using highlight_filtered_data$`2000` as another dataset for creating another map layer
# using entity as map_id,forest_convestion as fill aesthetic and highlight_world as map for third layer
# input : year - integer
# output : world_plot - plot object
# Assuming filtered_data is a list of data frames for each year
filtered_data <- lapply(filtered_data, function(df) {
df %>%
filter(!is.na(coo_name))
})
generateRefugeePlot <- function(year) {
world_plot <- ggplot(filtered_data[[as.character(year)]], aes(map_id = coo_name)) +
geom_map(
aes(fill = refugees_m),
map = world,
color = "#B2BEB5",
linewidth = 0.25,
linetype = "blank"
) +
# geom_map(
# data = highlight_filtered_data[[as.character(year)]],
# aes(map_id = coo_name, fill = refugees_m),
# map = highlight_world,
# color = "#71797E",
# show.legend = F
# ) +
expand_limits(x = world$long, y = world$lat) +
scale_fill_manual(values = color_mapping, na.value = "#F2F3F4") +
coord_fixed(ratio = 1) +
labs(
title = paste("Number of Refugees by Country in", year),
subtitle = "Migrated to USA",
caption = "Data source: TidyTuesday",
fill = "need to specify"
) +
theme_void() +
theme(
legend.position = "bottom",
legend.direction = "horizontal",
plot.title = element_text(size = 19, face = "bold", hjust = 0.5),
plot.subtitle = element_text(size = 15, color = "azure4", hjust = 0.5),
plot.caption = element_text(size = 12, color = "azure4", hjust = 0.95)
) +
guides(
fill = guide_legend(
nrow = 1,
direction = "horizontal",
title.position = "top",
title.hjust = 0.5,
label.position = "bottom",
label.hjust = 1,
label.vjust = 1,
label.theme = element_text(lineheight = 0.25, size = 9),
keywidth = 1,
keyheight = 0.5
)
)
return(world_plot)
}Discussion
We analyse refugee patterns over time across nations by analyzing 'year', 'coo_name', 'coa_name', and 'refugees' data. To acquire a clearer understanding of the repercussions, we will gather annual refugee data by nation and normalize it against population statistics. Time-series techniques will be used in the study to find trends and link them to important world events including wars, climate disasters, and changes in the economy. Our goal is to quantify the influence of these events on displacement by examining refugee trends around them. The context provided by external data, such as conflict histories, records of climate events, economic indicators, and epidemic timelines, will help to create a complete picture of the variables influencing global refugee movements. This analysis of the {refugees} dataset highlights the significant influence of outside factors on patterns of worldwide displacement throughout the last ten years. We have learned how major changes in refugee and displacement patterns have been fueled by geopolitical conflicts, natural disasters, and global crises through thorough investigation. The results emphasize not only how urgent it is to deal with the underlying causes of displacement but also how important it is to have strong, well-informed responses to the intricate problems associated with forced migration. These kinds of data-driven analysis are essential for developing humanitarian actions and policies that effectively address the world’s growing displacement crisis.
References
Title: Refugees, Source: tidytuesday, Link: https://github.com/rfordatascience/tidytuesday/blob/master/data/2023/2023-08-22/readme.md
Quarto, For documentation and presentation - Quarto
ggplot, For understanding of different plot - ggplot